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Abstract 

Recently there has been a growing interest of research in tabling in the logic programming 
community because of its usefulness in a variety of application domains including program 
analysis, parsing, deductive databases, theorem proving, model checking, and logic-based 
probabilistic learning. The main idea of tabling is to memorize the answers to some sub- 
goals and use the answers to resolve subsequent variant subgoals. Early resolution mech- 
anisms proposed for tabling such as OLDT and SLG rely on suspension and resumption 
of subgoals to compute fixpoints. Recently, the iterative approach named linear tabling 
has received considerable attention because of its simplicity, ease of implementation, and 
good space efficiency. Linear tabling is a framework from which different methods can be 
derived based on the strategies used in handling looping subgoals. One decision concerns 
when answers are consumed and returned. This paper describes two strategies, namely, 
lazy and eager strategies, and compares them both qualitatively and quantitatively. The 
results indicate that, while the lazy strategy has good locality and is well suited for find- 
ing all solutions, the eager strategy is comparable in speed with the lazy strategy and is 
well suited for programs with cuts. Linear tabling relies on depth-first iterative deepening 
rather than suspension to compute fixpoints. Each cluster of inter-dependent subgoals as 
represented by a top-most looping subgoal is iteratively evaluated until no subgoal in it 
can produce any new answers. Naive re-evaluation of all looping subgoals, albeit simple, 
may be computationally unacceptable. In this paper, we also introduce semi-naive opti- 
mization, an effective technique employed in bottom-up evaluation of logic programs to 
avoid redundant joins of answers, into linear tabling. We give the conditions for the tech- 
nique to be safe (i.e. sound and complete) and propose an optimization technique called 
early answer promotion to enhance its effectiveness. Benchmarking in B-Prolog demon- 
strates that with this optimization linear tabling compares favorably well in speed with 
the state-of-the-art implementation of SLG. 

KEYWORDS: Prolog, Semi-naive evaluation. Recursion, Tabling, Memoization, Linear 
tabling. 
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1 Introduction 

The SLD resolution used in Prolog may not be complete or efficient for programs 
in the presence of recursion. For example, for a recursive definition of the transitive 
closure of a relation, a query may never terminate under SLD resolution if the pro- 
gram contains left-recursion or the graph represented by the relation contains cycles 
even if no rule is left-recursive. For a natural definition of the Fibonacci function, 
the evaluation of a subgoal under SLD resolution spawns an exponential number 
of subgoals, many of which are variants. The lack of completeness and efficiency in 
evaluating recursive programs is problematic: novice programmers may lose confi- 
dence in writing declarative programs that terminate and real programmers have to 
reformulate a natural and declarative formulation to avoid these problems, resulting 
in cluttered programs. 

Tabling (jTamaki and Sato 1986( IWarren 1992P is a technique that can get rid of 
infinite loops for bounded-term-size programs and redundant computations in the 
execution of recursive programs. The main idea of tabling is to memorize the an- 
swers to subgoals and use the answers to resolve their variant descendents. Tabling 
helps narrow the gap between declarative and procedural readings of logic pro- 
grams. It not only is useful in the problem domains that motivated its birth, such as 
program analysis (jPawson et al. 19961) . parsing (jEisner et al. 20041 IJohnson 1995| 
IWarren 1999p . deductive databases (|Liu 1999 : Ramakrish nan and Ullman 19951|Sagonas et al. 1994D , 
and theorem proving (|Nielson et al. 200"4l [Pientka 2003|) , but also has been found 
essential in several other problem domains such as model checking (jRamakrishnan 2002|) 
and logic-based probabilistic learning( |Sato and Kameya 2001| IZhou et al. 2003|) . 
This idea of caching previously calculated solutions, called memoization, was first 
used to speed up the evaluation of functions (jMichie 1968p . OLDT (|Tamaki and Sato 1986[) 
is the first resolution mechanism that accommodates the idea of tabling in logic 
programming and XSB is the first Prolog system that successfully supports tabling 
( |Sagonas and Swift 1998 ). Tabling has become a practical technique thanks to the 



availability of large amounts of memory in computers. It has become an embed- 
ded feature in a number of other logic programming systems such as B-Prolog 
(|Zhou et al. 20001 [ZhoITet al. 2004p . Mercury (jSomogyi and Sagonas 200^, TALS 



dGuo and Gupta 2001D, and YAP (|Rocha et al. 2005bp . 



OLDT, and SLG (|Chen and Warren 1996|) alike, is non-linear in the sense that 
the state of a consumer must be preserved before execution backtracks to its pro- 
ducer. This non-linearity requires freezing stack segments (Sagonas and Swift 1998) 



or copying stack segments into a different area (Demoen and Sagonas 1999) before 
backtracking takes place. Linear tabling is an alternative tabling scheme (jShen et al. 200ll 
IZhou et al. 20001 IZhou and Sato 2003| IZhou et al. 20041) . The main idea of linear 
tabling is to use iterative computation of looping subgoals rather than suspension 
and resumption of them as is done in OLDT to compute fixpoints. This basic idea 
dates back to the ET* algorithm (jDietrich 1987^ . The DRA method proposed in 
dGuo and Gupta 2001D is based on the same idea but employs different strategies for 
handling looping subgoals and clauses. In linear tabling, a cluster of inter-dependent 
subgoals as represented by a top-most looping subgoal is iteratively evaluated until 
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no subgoal in it can produce any new answers. Linear tabling is relatively easy 
to implement on top of a stack machine thanks to its linearity, and is more space 
efficient than OLDT since the states of subgoals need not be preserved. 

Linear tabling is a framework from which different methods can be derived based 
on the strategies used in handling looping subgoals. One decision concerns when 
answers are consumed and returned. The lazy strategy postpones the consumption 
of answers until no answers can be produced. It is in general space efficient because 
of its locality and is well suited for all-solution search programs. The eager strategy, 
in contrast, prefers answer consumption and return over production. It is well suited 
for programs with cuts. These two strategies have been compared in SLG-WAM 
as two scheduling strategies called local and single-stack (jFreire et al. 1998|) . This 
paper gives a comprehensive analysis of these two strategies and compares their 
performance experimentally. 

Linear tabling relies on iterative evaluation of top-most looping subgoals to com- 
pute fixpoints. Naive re-evaluation of all looping subgoals may be computationally 
expensive. Semi-naive optimization is an effective technique used in bottom-up eval- 
uation of Datalog programs (jBancilhon and Ramakrishnan 19861 lUllman 1988[) . It 
avoids redundant joins by ensuring that the join of the subgoals in the body of 
each rule must involve at least one new answer produced in the previous round. 
The impact of semi-naive optimization on top-down evaluation had been unknown 
before (jZhou et al. 2004|) . In this paper, we also propose to introduce semi- naive 
optimization into linear tabling. We have made efforts to properly tailor semi-naive 
optimization to linear tabling. In our semi-naive optimization, answers for each 
tabled subgoal are divided into three regions as in bottom-up evaluation, but an- 
swers are consumed sequentially until exhaustion not incrementally as in bottom- up 
evaluation so that answers produced in a round are consumed in the same round. 
We have found that incremental consumption of answers does not fit linear tabling 
since it may require more iterations to reach fixpoints. Moreover, consuming answers 
incrementally may cause redundant consumption of answers. We further propose a 
technique called early promotion of answers to reduce redundant consumption of 
answers. Our benchmarking shows that this technique gives significant speed-ups 
to some programs. 

An efficient tabling system has been implemented in B-Prolog0 in which the 
lazy strategy is employed by default but the eager strategy can be used through 
declarations for subgoals that are in the scopes of cuts or are not required to return 
all the answers. Our tabling system not only consumes considerably less stack space 
than XSB for some programs but also compares favorably well in speed with XSB. 

The theoretical framework of linear tabling is given in (jShen et al. 200l| . The 
main objective of this paper is to propose evaluation strategies and their optimiza- 
tions for linear tabling. The remainder of the paper is structured as follows: In the 
next section we define the terms used in this paper. In Section 3 we give the linear 
tabling framework and the two answer consumption strategies. In Section 4 we in- 

1 www.bprolog.com 
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troduce semi-naive optimization into linear tabling and prove its completeness. In 
Section 5 we describe the implementation of our tabling system and also show how 
to implement semi-naive optimization. In Section 6 we compare the tabling strate- 
gies experimentally, evaluate the effectiveness of semi-naive optimization, and also 
compare the performance of B-Prolog with XSB. In Section 7 we survey the related 
work and in Section 8 we conclude the paper. 



2 Preliminaries 

In this section we give the definitions of the terms to make this paper as much self- 
contained as possible. The reader is referred to ( [Lloyd 1988 ) for a description of 



SLD resolution. In this paper, we always assume the top-down strategy for selecting 
clauses and the left-to-right computation rule. 

Let P be a program. Tabled predicates in P are explicitly declared and all the 
other predicates are assumed to be non-tabled. A subgoal of a tabled predicate is 
called a tabled subgoal. Tabled predicates are transformed into a form that facilitates 
execution: each rule ends with a dummy subgoal named memo{H) where H is the 
head, and each tabled predicate contains a dummy ending rule whose body contains 
only one subgoal named check_completion(H). For example, given the definition of 
the transitive closure of a relation, 

: -table p/2. 

p(X,Y) :-p(X,Z) ,e(Z,Y) . 

p(X,Y) :-e(X,Y) . 

The transformed predicate is as follows: 

p(X,Y) :-p(X,Z) ,e(Z,Y) ,memo(p(X,Y)) . 
p(X,Y) :-e(X,Y) ,memo(p(X,Y) ) . 
p(X,Y) : -check_completion(p(X,Y) ) . 

A table is used to record subgoals and their answers. For each subgoal and its 
variants, there is an entry in the table that stores the state of the subgoal (e.g., 
complete or not) and an answer table for holding the answers generated for the 
subgoal. Initially, the answer table is empty. 

Definition 1 

Let ti and t2 be two terms with no shared variables. The term ti subsumes t2 if 
there exists a substitution 9 such that ti9—t2. The two terms ti and ^2 are called 
variants if they subsume each other. 

Definition 2 

Let G = {Ai, A2, Ah) he a goal. The first subgoal Ai is called the selected subgoal 
of the goal. G' is derived from G by using a tabled answer F if there exists a unifier 
9 such that Ai9 = F and G" = {A2, ...,Ak)9. G' is derived from G by using a rule 
"i? : iiAi9 = H9 and G" = {Bi, ...,B^,A2, ...,Au)9. Ai is said to be 

the parent of and Bm. The relation ancestor is defined recursively from the 

parent relation. 
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Fig. 1. A top- most looping subgoal. 



Definition 3 

A tabled subgoal that occurs first in the construction of an SLD tree is called a 
pioneer, and all subsequent variants are called followers of the pioneer. Let Gq be 
a given goal, and Go Gi => . . . => Gn be a derivation where each goal is derived 
from the goal immediately preceding it. Let Gi ^ . . . ^ Gj he a sub-sequence 
of the derivation where Gi — {A...) and Gj = {A'...). The sub-sequence forms a 
loop if A and A' are variants. The subgoals A and A' are called looping subgoals. 
In particular, A is called the pioneer looping subgoal and A' is called the follower 
looping subgoal of the loop. 

Notice that the pioneer and follower looping subgoals are not required to have the 
ancestor-descendent relationship, and thus a derivation that contains two variant 
subgoals may not be a real loop. Consider, for example, the goal "p{X),p{Y)" 
where p is defined by facts. The derivation "p{X),p{Y)''' p{y) is treated as a 
loop although the selected subgoal p{Y) in the second goal is not a descendant of 
p{X). 

Definition 4 

A subgoal A is said to be dependent on another subgoal A' if A' occurs in a derived 
goal from A, i.e., A =^ . . . ^ {A'...). Two subgoals are said to be inter- dependent 
if they are dependent on each other. Inter-dependent subgoals constitute a cluster, 



which is called a strongly connected component elsewhere ( Sagonas and Swift 1998 ). 
A subgoal in a cluster is called the top-most subgoal of the cluster if none of its 
ancestors is included in the cluster. 



Unless a cluster contains only a single subgoal, its top-most subgoal must also 
be a looping subgoal. For example, the subgoals at the nodes in the SLD tree in 
Figure [1] constitute a cluster and the subgoal p at node 1 is the top-most looping 
subgoal of the cluster. 
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3 Linear Tabling and Answer Consumption Strategies 

Linear tabling takes a transformed program and a goal, and tries to find a patii 
in the SLD tree that leads to an empty goal. The primitive table_start{A) is ex- 
ecuted when a tabled subgoal A is encountered. Just as in SLD resolution, linear 
tabling explores the SLD tree in a depth-first fashion, taking special actions when 
tablestartfA), memo(A), and check-Completion(A) are encountered. Backtracking 
is done in exactly the same way as in SLD resolution. When the current path 
reaches a dead end, meaning that no action can be taken on the selected subgoal, 
execution backtracks to the latest previous goal in the path and continues with an 
alternative branch. When execution backtracks to the top-most looping subgoal of 
a cluster, however, we cannot fail the subgoal even after all the alternative clauses 
have been tried. In general, the evaluation of a top-most looping subgoal must be 
iterated until its fixpoint is reached. We call each iteration of a top-most looping 
subgoal a round. 

Various linear tabling methods can be devised based on the framework. A linear 
tabling method comprises strategies used in the three primitives: tahle-start(A), 
memo(A), and check_completion(A). In linear tabling, a pioneer subgoal has two 
roles: one is to produce answers into the table and the other is to return answers 
to its parent through its variables. Different strategies can be used to produce and 
return answers. The lazy strategy gives priority to answer production and the eager 
strategy prefers answer consumption over production. In the following we define the 
three primitives in both strategies. 

3.1 The lazy strategy 

The lazy strategy postpones the consumption of answers until no answers can be 
produced. In concrete, for top-most looping subgoals no answer is returned until 
they are complete, and for other pioneer subgoals answers are consumed only after 
all the rules have been tried. 

3.1.1 table_start{A) 

This primitive is executed when a tabled subgoal A is encountered. The subgoal 
A is registered into the table if it is not registered yet. If A's state is complete 
meaning that A has been completely evaluated before, then A is resolved by using 
the answers in the table. 

If is a pioneer, meaning that it is encountered for the first time in the current 
path, then different actions are taken depending on A's state. If A's state is evaluated 
meaning that A has occurred before in a different path during the current round, 
then it is resolved by using answers. Otherwise, if A has never occurred before during 
the current round, it is resolved by using rules. In this way, a pioneer subgoal needs 
to be evaluated only once in each round. 

If A is a follower of some ancestor Aq, meaning that a loop has been encountered!! 



^ As to be discussed later, Aq must be an ancestor of A under the lazy strategy. 
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then it is resolved by using the answers in the table. After the answers are exhausted, 
A fails. Failing A is unsafe in general since it may have not returned all of its possible 
answers. For this reason, the top-most looping subgoal of the cluster of A needs be 
iterated until no new answer can be produced. 



This primitive is executed when an answer is found for the tabled subgoal A. If the 
answer A is already in the table, then just fail; otherwise fail after the answer is 
added into the table. The failure of memo postpones the return of answers until all 
rules have been tried. 



This primitive is executed when the subgoal A is being resolved by using rules and 
the dummy ending rule is being tried. If A has never occurred in a loop, then A's 
state is set to complete and A is failed after all the answers are consumed. 

If A is a top-most looping subgoal, we check if any new answers are produced 
during the last iteration of the cluster under A. If so, A is re-evaluated by calling 
table _start{A) after all the dependent subgoals's states are initialized. Otherwise, 
if no new answer is produced, A is resolved by using answers after its state and all 
its dependent subgoals' states are set to complete. Notice that a top-most looping 
subgoal does not return any answers until it is complete. 

If A is a looping subgoal but not a top-most one, A will be resolved by using 
answers after its state is set to evaluated. Notice that ^'s state cannot be set 
to complete since A is contained in a loop whose top-most subgoal has not been 
completely evaluated. For example, in Figure[Tl q reaches its fixpoint only after the 
top-most looping subgoal p reaches its fixpoint. 

As described in the definition of table -start{A)^ an evaluated subgoal is never 
evaluated using rules again in the same round. This optimization is called subgoal 
optimization in (jZhou and Sato 2003^ . If evaluating a subgoal produces some new 
answers then the top-most looping subgoal will be re-evaluated and so will the sub- 
goal; and if evaluating a subgoal does not produce any new answer, then evaluating 
it again in the same round would not produce any new answers either. Therefore, 
the subgoal optimization is safe. 



Consider the following program, where p/2 is tabled, and the query p(a,YO). 



3.1.2 memo{A) 



3.1.3 check _completion{A) 



3.1.4 Example 



p(X,Y) 
p(X,Y) 
p(X,Y) 



•p(X,Z) ,e(Z,Y) ,memo(p(X,Y)) . (pi) 
•e(X,Y) ,memo(p(X,Y)) . (p2) 
•check_completion(p(X,Y) ) . (p3) 



e(a,b) . 
e(b,c) . 
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The following shows the steps that lead to the production of the first answer: 
1: p(a,YO) 

-IJ-apply pi 

2: p(a,Zl) ,e(Zl,YO) ,memo (p(a, YO) ) 

loop found, backtrack to goal 1 

1: p(a,YO) 

-IJ- apply p2 

3: e(a.YO) ,memo(p(a,YO)) 

JJ- apply c(a,b) 

4: memo(p(a,b) ) 

JJ- add answer p(a,b) 

After the answer p(a,b) is added into the table, memo(p(a,b)) fails. The failure 
forces execution to backtrack to p(a,YO). 

1: p(a,YO) 

JJ- apply p3 

5: check_completion(p(a, YO) ) 

Since p(a,YO) is a top-most looping subgoal which has not been completely evalu- 
ated yet, check_completion(p(a,YO) ) does not consume the answer in the table 
but instead starts re-evaluation of the subgoal. 

1: p(a,YO) 

JJapply pi 

6: p(a,Zl) ,e(Zl,YO) ,memo (p(a, YO) ) 

JJ'Use answer p(a,b) 

7: e(b,YO) ,memo(p(a,YO)) 

JJ-apply c(b,c) 

8: memo(p(a,c)) 

When the follower p (a, Zl) is encountered this time, it consumes the answer p (a,b) . 
The current path leads to the second answer p(a,c). On backtracking, the goal 
numbered 6 becomes the current goal. 

6: p(a,Zl) ,e(Zl,YO) ,memo(p(a,YO) ) 

JJ-usc answer p(a,c) 

9: e(c,YO) ,memo(p(a,YO)) 
Goal 9 fails. Execution backtracks to the top goal and tries the clause p3 on it. 
1: p(a,YO) 

JJ. apply p3 

10: check_coinpletion(p(a,YO)) 

Since the new answer p(a,c) is produced in the last round, the top-most looping 
subgoal p(a,YO) needs to be evaluated again. The next round produces no new 
answer and thus the subgoal's state is set to complete. After that the top-most 
subgoal returns the answers p(a,b) and p(a,c). 
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3.1.5 Properties of the lazy strategy 

Under the lazy strategy, answers are not returned immediately after they are pro- 
duced but are returned via the table after all clauses are tried. No answer is returned 
for a top-most looping subgoal until the subgoal is complete. 

All loops are guaranteed to be real: for any loop Gi = {A. . .) ^ ... => Gj = 
{A' . . .) where A and A' are variants, A must be an ancestor of A' . Because each 
cluster of inter-dependent subgoals is completely evaluated before any answers are 
returned to outside of the cluster, the lazy strategy has good locality and is thus 
suited for finding all solutions. For example, when the subgoal p{Y) is encountered 
in the goal "p(X) ,p(Y)", the subtree for p(X) must have been explored completely 
and thus needs not be saved for evaluating p (Y) . 

The cut operator cannot be handled efficiently under the lazy strategy. The goal 
!, q{X)" produces all the answers for p{X) even though only one is needed. 

3.2 The eager strategy 

The eager strategy prefers answer consumption and return over production. For a 
pioneer, answers are used first and rules are used only after all available answers 

arc exhausted, and moreover a now answer is returned to its parent immediately 
after it is added into the table. The following describes how the three primitives 
behave under the eager strategy. 

3.2.1 table^tart{A) 

Just as in the lazy strategy, A is registered if it is not registered yet. A is resolved by 
using the tabled answers if A is complete or A is a follower of some former variant 
subgoal. If A is a pioneer, being encountered for the first time in the current round, 
it is resolved by using answers first, and then rules after all existing answers are 
exhausted. 

3.2.2 memo{A) 

If the answer A is already in the table, then this primitive fails; otherwise, this 
primitive succeeds after adding the answer A into the table. Notice that A is re- 
turned immediately after it is added into the table. If A is not new, then it must 
have been returned before. 

3.2.3 check jzom,pletion{A) 

If A is a top-most looping subgoal, just as in the lazy strategy, we check whether 
any new answers are produced during the last iteration of A. If so, A is eval- 
uated again by calling table-start{A) . Otherwise, if no new answer is produced, 
this primitive fails after A's and all its dependent subgoals' states are set to com- 
plete. If A is a looping subgoal but not a top- most one, this primitive fails after 
j4's state is set to evaluated. An evaluated subgoal is never evaluated using rules 
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again in the same round. Notice that unlike under the lazy strategy, the primitive 
check-Completion{A) never returns any answers under the eager strategy. As de- 
scribed above, all the available answers must have been returned by table^tart{A) 
and memo{A) by the time check-Completion{A) is executed. 

3.2.4 Example 

Because of the need to re-evaluate a top-most looping subgoal, redundant solutions 
may be observed for a query. Consider, for example, the following program and the 
query "p(X),p(Y)". 

p(l):-memo(p(l)). (rl) 
p(2) :-memo(p(2)) . (r2) 
p(X) : -check_completion(p(X) ) . (r3) 

The following derivation steps lead to the return of the first solution (1,1) for 
(X,Y). 

1: p(X) ,p(Y) 

JJ' use rl 

2: ineino(p(l)) ,p(Y) 

-IJ- add answer p(l) 

3: p(Y) 

JJ' loop found, use answer p(l) 

When the subgoal p(Y) is encountered, it is treated as a follower and is resolved 
using the tabled answer p ( 1 ) . After that the first solution (1,1) is returned to the 
top query. When execution backtracks to p(Y) , it fails since it is a follower and no 
more answer is available in the table. Execution backtracks to p(X) , which produces 
and adds the second answer p(2) into the table. 

1: p(X).p(Y) 

-IJ- use r2 

4: memo (p(2)) ,p(Y) 

JJ' add answer p(2) 

5: p(Y) 

JJ' use answer p(l) 

When p(Y) is encountered this time, there are two answers p(l) and p(2) in the 
table. So the next two solutions returned are (2,1) and (2,2). When execution 
backtracks to goal 1, the dummy ending rule is applied. 



1: p(X),p(Y) 

-IJ. use r3 

6: check_completion(p(X)),p(Y) 



Linear Tabling Strategies and Optimizations 



11 



Since new answers are added into the table during this round, the subgoal p(X) 
needs to be evaluated again, first using answers and then using rules. The second 
round produces no answer but returns the four solutions (1,1), (1,2), (2,1) and 
(2,2) among which only (1,2) has not been observed before. 



3.2.5 Properties of the eager strategy 

Since answers are returned eagerly, a pioneer and a follower may not have an 
ancestor-descendant relationship. Because of the existence of this kind of fake loops 
and the necessity of iterating the evaluation of top-most looping subgoals, redun- 
dant solutions may be observed. In the previous example, the solutions (1,1), 

(2. 1) and (2,2) are each observed twice. Provided that the top-most looping sub- 
goal p(X) did not return the answer p(l) again in the second round, the solution 

(1.2) would have been lost. 

The eager strategy is more suited than the lazy strategy for single-solution search. 
For certain applications such as planning it is unreasonable to find all answers 
either because the set is infinite or because only one answer is needed. For these 
applications the eager strategy is more effective than the lazy one. Cuts are handled 
more efficiently under the eager strategy. 



4 Semi-naive Optimization 

The basic linear tabling framework described in the previous section does not dis- 
tinguish between new and old answers. The problem with this naive method is 
that it redundantly joins answers of subgoals that have been joined in early rounds. 
Semi-naive optimization (jUUman 1988P reduces the redundancy by ensuring that at 
least one new answer is involved in the join of the answers for each rule. In this sec- 
tion, we introduce semi-naive optimization into linear tabling and identify sufficient 
conditions for it to be complete. We also propose a technique called early answer 
promotion to further avoid redundant consumption of answers. This optimization 
works with both the lazy and eager strategies. 



4 ■ 1 Preparation 



To make semi-naive optimization possible, we divide the answer table for each 
tabled subgoal into three regions: 



old 



previous 



current 



The names of the regions indicate the rounds during which the answers in the 
regions are produced: oW means that the answers were produced before the previous 
round, previous the answers produced during the previous round, and current the 
answers produced in the current round. The answers stored in previous and current 
are said to be new. Before each round is started, answers are promoted accordingly: 
previous answers become old and current answers become previous. 
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In our optimization, answers are consumed sequentially. For a subgoal, either all 
the available answers or only new answers are consumed. This is unlike in bottom- 
up evaluation where answers are consumed incrementally, i.e., answers produced 
in a round are not consumed until the next round. As will be discussed later, 
incremental consumption of answers as is done in bottom-up evaluation does avoid 
certain redundant joins but does not fit linear tabling since it may require more 
rounds to reach fixpoints. 

A predicate p calls a predicate q if: (1) if q occurs in the body of at least one rule 
in the definition of p (p calls q directly); or (2) q does not occur in the body of any 
rule in the definition of p but there exists a predicate in the body of a rule in the 
definition of p that calls q [p calls q indirectly). The calling relationship constitutes 
a graph called a call graph. 

For a given program, we find a level mapping from the predicate symbols in the 
program to the set of integers to represent the call graph of the program. Let m be 
a level mapping. We extend the notation to assume that m(p(. . .)) = m{p/n) for 
any subgoal p{- ■ ■) of arity n. 

Definition 5 

For a given program, a level mapping m represents the call graph if: for each rule 
"iJ:— ^1, j4„" in the program, m{H) > m(Aj) iff the predicate of Ai does not 
call (either directly or indirectly) the predicate of H, and m{H) = m{Ai) iff the 
predicates of H and Ai call each other. 

The level mapping as defined divides predicates in a program into several strata. 
The predicate at each stratum depends only on those on the lower strata. The 
level mapping is an abstract representation of the dependence relationship of the 
subgoals that may occur in execution. If two subgoals A and A' occur in a loop, 
then it is guaranteed that m{A) = m{A'). 

Definition 6 

Let "if:— ^1, Ak, A„" be a rule in a program and m be the level mapping that 
represents the call graph of the program. Ak is called the last depending subgoal of 
the rule if m(>lfe) = m{H) and m{H) > m(Aj) for i> k. 

The last depending subgoal Ak is the last subgoal in the body that may depend 
on the head to become complete. Thus, when the rule is re-executed on a subgoal, all 
the subgoals to the right of Ak that have occurred before must already be complete. 

Definition 7 

Let "i?:— Ai, A„" be a rule in a program and m be a level mapping that repre- 
sents the call graph of the program. If there is no depending subgoal in the body, 
i.e., m{H) > m{Ai) for i = 1, ...,n, then the rule is called a base rule. 



4-2 Semi-naive optimization 
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Theorem 1 

Let "ff:— Ai, Ak, An" be a rule where A^ is the last depending tabled subgoal, 
and C be a subgoal that is being resolved by using the rule in an iteration of a top- 
most looping subgoal T. For a combination of answers oi Ai, ■ ■ ■, and Ak-i, if C has 
occurred in an early round and the combination does not contain any new answers, 
then it is safe to let Ak consume new answers only. 



Proof 

Because Ak is the last depending subgoal, the subgoals Ak+i, • ■ •, and A„ must have 
been completely evaluated when C is re-evaluated. Let Ak^^^ and ^fe„e™ be the old 
and new answers of the subgoal Ak, respectively. For a combination of answers of 
Ai, ■ ■ ■, and Ak-i, if the combination does not contain new answers then the join 
of the combination and Ak^i^ must have been done and all possible answers for C 
that can result from the join must have been produced during the previous round 
because the subgoal C has been encountered before. Therefore only new answers in 
Ak„^^ should be used. □ 



Corollary 1 

Base rules need not be considered in the re-evaluation of any subgoals. 

Semi-naive optimization would bo unsafe if it were applied to new subgoals that 
have never been encountered before. The following example illustrates this possi- 
bility: 

?- p(X,Y). 

: -table p/2. 

p(X,Y) :- p(X,Z),q(Z,Y). (CI) 
p(b,c) :- p(X,Y) . (C2) 
p(a,b). (C3) 

: -table q/2. 

q(c,d) :- p(X,Y),t(X,Y). (C4) 

t(a,b). (C5) 

In the first round of p(X,Y) the answer p(a,b) is added to the table by C3, and 
in the second round the rule C2 produces the answer p(b,c) by using the answer 
produced in the first round. In the third round, the rule CI generates a new subgoal 
q(c,Y) after p(X,Z) consumes p(b,c). If semi-naive optimization were applied to 
q(c,Y), then the subgoal p(X,Y) in C4 could consume only the new answer p(b,c) 
and the third answer p(b,d) would be lost. 
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4.3 Analysis 

Semi-naive optimization can lower the complexity of evaluation for some programs. 
Consider the following example created by David S. WarrenH 

: -table p/2. 

p(X,Y) :- p(X,Z) ,c(Z,a,Y) . 
p(X,Y) :- p(X,Z) ,c(Z,b,Y) . 
p(X,X) . 

which detects if a given string represented as facts c(/, S,J) {J = I + 1,S =a or 
S =b) is a sentence of the regular expression (a|6)*. For a string (a6)"/^, the query 
p(0,n) needs n/2 rounds to reach the fixpoint. With semi-naive optimization, the 
variants of p(X,Z) in the bodies consume only new answers, and therefore the 
program takes linear time. Without semi-naive optimization, however, the program 
would take O(n^) time since the variants of p(X,Z) would consume all existing 
answers. 

In our semi-naive optimization, answers produced in the current round are con- 
sumed immediately rather than postponed to the next round as in the bottom-up 
version, and answers are promoted each time a new round is started. This way of 
consuming and promoting answers may cause certain redundancy. 

Consider the conjunction {P,Q). Assume Qo, Qp, and Qc are the sets of answers 
in the three regions (respectively, old, previous, and current) of the subgoal Q when 
Q is encountered in roimd i. Assume also that P had been complete before round 
i and Pa is the set of answers. The join Pa IX {Qp [J Qc) is computed for the 
conjunction in round i. Assume Q'^, Q'p, and Q'^ are the sets of answers in the three 
regions when Q is encountered in round i-|-l. Since answers are promoted before 
round « + 1 is started, we have: 

Q'o^QoUQp 

where a denotes the new answers produced for Q after the conjunction (P, Q) in 
round i. When the conjunction (P, Q) is encountered in round i + 1, the following 
join is computed. 

Pa ^ iQ'p{JQc)=Pa X (QcU«UQc') 

Notice that the join Pa N is computed in both round i and i -I- 1. 

We could allow last depending subgoals to consume answers incrementally as 
is done in bottom-up evaluation, but doing so may require more rounds to reach 
fixpoints. Consider the following example, which is the same as the one shown above 
but has a different ordering of clauses: 

?- p(X,Y) . 
: -table p/2. 



Personal communications. 
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p(a,b). (CI) 
p(b,c) :- p(X,Y). (C2) 
p(X,Y) :- p(X,Z),q(Z,Y). (C3) 



: -table q/2 . 

q(c,d) :- p(X,Y),t(X,Y). (C4) 



t(a,b) . 



(C5) 



In the first round, CI produces the answer p( a, b) . When C2 is executed, the subgoal 
in the body cannot consume p(a,b) since it is produced in the current round. 
Similarly, C3 produces no answer either. In the second round, p(a,b) is moved to 

the previous region, and thus can be consumed. C2 produces a new answer p(b,c). 
When C3 is executed, no answer is produced since p(b,c) cannot be consumed. In 
the third round, p(a,b) is moved to the old region, and p(b,c) is moved to the 
previous region. C3 produces the third answer p(b,d). The fourth round produces 
no new answer and confirms the completion of the computation. So in total four 
rounds are needed to compute the fixpoint. If answers produced in the current 
round are consumed in the same round, then only two rounds are needed to reach 
the fixpoint. 



As discussed above, sequential consumption of answers may cause redundant joins. 
In this subsection, we propose a technique called early promotion of answers to 
reduce the redundancy. 

Definition 8 

Let Q be the first follower that exhausts its answers in the current round. Then all 
the answers of Q in the current region are promoted to the previous region once 
being consumed by Q. 

Consider again the conjunction (P, Q) where Q is the first follower that exhausts 
its answers. The answers in the current region Qc are promoted to the previous 
region after Q has consumed all its answers in round i. By doing so, the join 
Pa ^ Qc will not be recomputed in round i + 1 since Qc must have been promoted 
to the old, region in roimd i + 1. 

Consider, for example, the following program: 

?- p(X,Y). 
: -table p/2. 



Before C2 is executed in the first round, p(a,b) is in the CMrren< region. Executing 
C2 produces the second answer p(b,c). Since the subgoal p(X,Y) in C2 is the first 



4 -4 Early promotion of answers 



p(a,b) . 

p(b,c) :- p(X,Y) . 



(CI) 
(C2) 
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follower that exhausts its answers in the current round, it is qualified to promote 
its answers. So the answers p(a,b) and p(b,c) are moved from the current region 
to the previous region immediately after being consumed by p(X,Y). As a result, 
the potential redundant consumption of these answers by p(X,Y) is avoided in the 
second round since they will all be transferred to the old region before the second 
round starts. 

Theorem 2 

Early promotion does not lose any answers. 
Proof 

First note that although answers are tabled in three disjoint regions, all tabled 
answers will be consumed except for some last depending subgoals that would 
skip the answers in their old regions (see Theorem 1). Assume, on the contrary, 
that applying early promotion loses answers. Then there must be a last depending 
subgoal Ak in a rule "iJ:— Ai, A^, and a tabled answer A for Ak such 
that A has been moved to the old region before being consumed by A^ so that A 
will never be consumed by Ak- Assume A is produced in round i by a variant of 
Ak- We distinguish between the following two cases: 

1. The last depending subgoal Ak is not selected in round i. In round j{j > i), Ak is 
selected either because H is new or some As{s < k) consumes a new answer. By 
Theorem 1, Ak will consume all answers in the three regions, including the answer 
A. 

2. Otherwise, A must be produced by Ak itself or a variant subgoal of Ak that is 
selected either before or after Ak in round i. If A is produced by Ak itself or before 
Ak is selected, then the answer will be consumed by Ak since promoted answers 
will remain new by the end of the round. If A is produced by a variant after Ak 
is selected, then the answer cannot be promoted because Ak exhausts its answers 
before the variant. In this case, the answer A will remain new in the next round 
and will thus be consumed by Ak- 

Both of the above two cases contradict our assumption. The proof then concludes. 
□ 

5 Implementation 

Changes to the Prolog machine ATOAM (jZhou 1996P are needed to implement 
linear tabling. In this section we describe the changes to the data structures and 
the instruction set. To make the paper self-contained, we first give an overview of 
the ATOAM architecture. 

5.1 An overview of ATOAM 

The ATOAM uses all the data areas used by the WAM. The heap stores terms 
created during execution. The register H points to the top of the heap. The trail stack 
stores updates that must be undone upon backtracking. The register T points to 
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the top of the trail stack. The control stack stores frames associated with predicate 
calls. 

Unlike in the WAM where arguments are passed through argument registers, 
arguments in the ATOAM are passed through stack frames and only one frame is 
used for each predicate call. Each time a predicate is invoked by a call, a frame 
is placed on top of the local stack unless the frame currently at the top can be 
reused. Frames for different types of predicates have different structures. For stan- 
dard Prolog, a frame is either determinate or nondeterminate. A nondeterminate 
frame is also called a choice point. The register AR points to the current frame and 
the register B points to the latest choice point. 

A determinate frame has the following structure: 



Al. .An 


Arguments 


AR 


Pointer to the parent frame 


CP 


Continuation program pointer 


BTM 


Bottom of the frame 


TOP 


Top of the frame 


Yl. .Ym 


Local variables 



Where BTM points to the bottom of the frame, i.e., the slot for the first argument, 
and TOP points to the top of the frame, i.e., the slot just next to that for the last 
local variableEI. The TOP register points to the next available slot on the stack. 
The BTM slot is not in the original version (jZhou 1996[) . This slot is introduced 
for supporting garbage collection and co-routining. The AR register points to the 
AR slot of the current frame. Arguments and local variables are accessed through 
offsets with respect to the AR slot. An argument or a local variable is denoted as 
yd) where I is the offset. Arguments have positive offsets and local variables have 
negative offsets. It is the caller's job to place the arguments and fill in the AR, and 
CP slots. The callee fills in the BTM and TOP slots and initializes the local variables. 

A choice point frame contains, in addition to the slots in a determinate frame, 
four slots located between the TOP slot and local variables: 



CPF 


Backtracking program pointer 


H 


Top of the heap 


T 


Top of the trail 


B 


Parent choice point 



The CPF slot stores the program pointer to continue with when the current branch 
fails. The slot H points to the top of the heap when the frame is allocated. As in 
the WAM, a new register, called HB, is used as an alias for B->H. When a variable 
is bound, it must be trailed if it is older than B or HB. 

5.2 The extension of ATOAM for tabling 

A new data area, called table area, is introduced for memorizing tabled subgoals and 
their answers. The subgoal table is a hash table that stores all the tabled subgoals 

* It is a convention in the literature that the stack is assumed to grow downwards 
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encountered in execution. For each tabled subgoal and its variants, there is an entry 
in the table, which is a record containing the following information: 

SubgoalCopy 

PioneerAR 

State 

TopMostLoopingSubgoal 
Dependent Subgoal s 
AnswerTable 

The field SubgoalCopy points to the copy of the subgoal in the table area. In the 
copy, all variables are numbered. Therefore all variants of the subgoal are identical. 

The field PioneerAR points to the frame of the pioneer, which is needed for 
implementing cuts. When the choice point of a tabled subgoal is cut off before the 
subgoal reaches completion, the field PioneerAR will be set to NULL. When a variant 
of the subgoal is encountered again after, the subgoal will be treated as a pioneer. 

The field State indicates whether the subgoal is a looping subgoal, whether the 
answer table has been revised, and whether the subgoal is complete or evaluated. 
When execution backtracks to a top-most looping subgoal, if the revised bit is 
set, then another round will be started for the subgoal. A top-most looping subgoal 
becomes complete if this revised bit is unset after a round. At that time, the subgoal 
and all of its dependent subgoals will be set to complete. As described in 13. 1.31 a-n 
evaluated subgoal is never evaluated again using rules in each round. 

The TopMostLoopingSubgoal field points to the entry for the top-most looping 
subgoal, and the field DependentSubgoals stores the list of subgoals on which this 
subgoal depends. When a top-most looping subgoal becomes complete, all of its 
dependent subgoals turn to complete too. 

The field AnswerTable points to the answer table for this subgoal, which is 
also a hash table. Hash tables expand dynamically. Let g be the pointer to the 
record for a subgoal in the table. The first answer in the answer table is ref- 
erenced as g->AnswerTable->FirstAnswer and the last answer is referenced as 
g->AnswerTable->LastAnswer. In the beginning, the answer table is empty and 
both FirstAnswer and LastAnswer reference a dummy answer. 

The frame for a tabled predicate contains the following two slots in addition to 
those slots stored in a choice point frame: 

SubgoalTable 
Current Answer 

The SubgoalTable points to the subgoal table entry, and the Current Answer points 
to the last answer that has been consumed. The next answer can be reached from 
this reference on backtracking. When a frame is created, the slot CurrentAnswer 
is initialized to be g->AnswerTable->FirstAnswer where g is the pointer to the 
record for the tabled subgoal. 

Three new instructions, namely table_start, memo, and check_completion, are 
introduced into the ATOAM for encoding the three table primitives. Figure[2]shows 
the compiled code of an example program. 
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: -tabled p/2. 
•/. p(X,Y):-p(X,Z),e(Z,Y). 
•/. p(X,Y) :-e(X,Y) . 
p/2: table_start 2,1 
fork r2 

par a_ value y(2) 
para_var y(-13) 
call p/2 •/. p(X,Z) 

par a_ value y(-13) 
par a_ value y(l) 
call e/2 7. e(Z,Y) 

memo 
r2: fork r3 

par a_ value y(2) 

par a_ value y(l) 

call e/2 e(X,Y) 

memo 

rS: check_completion p/2 
Fig. 2. Compiled code of an example program. 



The table_start instruction takes two operands: the arity (2) and the number 
of local variables (1). The fork instruction sets the CPF slot to hold the address 
to backtrack to on failure. The parameter passing instructions (para_value and 
para_var in this example) pass arguments to the calico's frame. The memo instruc- 
tion is executed after an answer has been found. The check_completion instruction 
takes the entrance (p/2) as an operand so that the predicate can be re-entered when 
it needs re-evaluation. 



5.3 Implementing semi-naive optimization 

To implement semi-naive optimization, we add the following two pointers into the 
record for each tabled subgoal: 

LastOldAnswer 
LastPrevAnswer 

where the pointer LastOldAnswer points to the last answer in the old region and 
the pointer LastPrevAnswer points to the last answer in the previous region. The 
check_completion instruction resets the pointers for all the tabled subgoals in the 

current cluster before it starts the next round: 

for each subgoal g in the current cluster { 
g->Last01dAnswer = g->LastPrevAnswer ; 
g->LastPrevAnswer = g->AnswerTable->LastAnswer ; 

} 

The memo instruction is changed so that early promotion of answers is performed 
if the condition for promotion is met. Let g be the pointer to the tabled subgoal. 
If the subgoal has exhausted all its answers in the table and early promotion has 
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never be done before on the subgoal in the same round, then answers in the current 
region are promoted to the previous region: 

g->LastPrevAnswer = g->AnswerTable->LastAnswer 

The promoted answers will be moved to the old region before the start of the next 
round. 

A bit vector is added into the frame for each tabled predicate to indicate if any 
new answer has been consumed by any tabled subgoal. Semi-naive optimization 
can be applied only if no tabled subgoal in the predicate has consumed any new 
answer. 

A new instruction, called last_depending_tabled_call, is introduced to encode 
last depending tabled subgoals. In the example shown in Figure[21 the "call p/2" 
instruction is changed to "last_depending_tabled_call p/2" to enable semi-naive 
optimization. The last_depending_tabled_call instruction has the same behavior 
as the call instruction, but the callee can check the type of the instruction to see 
if it is invoked by a last depending tabled subgoal. 

Let g be the pointer to the current tabled subgoal. The table_start instruc- 
tion sets the CurrentAnswer slot of the frame to g->LastDldAnswer so that the 
subgoal consumes only new answers if: (1) the parent frame is a tabled frame; (2) 
no bit in the bit vector in the parent frame is set, which means that no tabled 
subgoal has consumed any new answer; and (3) the predicate is invoked by a 
last_dependiiig_tabled_call instruction. If any of these condition is not satis- 
fied, the CurrentAnswer slot is set to g->AnswerTable->FirstAnswer and all the 
answers will be consumed by the subgoal. 

6 Performance Evaluation 

We empirically compared the two answer consumption strategies and evaluated 
the effectiveness of semi-naive optimization. We also compared the performance 
of B-Prolog (version 6.9) with XSB (version 3.0). A Linux machine with 750MIIz 
Intel process and 512GB RAM was used in the experiment. Benchmarks from three 
different sources were usedH Datalog programs shown in Figure [3] with randomly 
generated graphs; the CHAT benchmark suite ( [Demoen and Sagonas 1999] ); and a 
parser, called atr, for the Japanese language defined by a grammar of over 860 rules 
(|Uratani et al. 1994|) . This section presents the experimental results and reports the 
statistics to support the results. This section also gives experimental results on the 
Warren's example for which SLG as implemented in XSB has lower time complexity 
than linear tabling when semi-naive optimization ceases to be effective. 

6.1 Comparison of the two answer- consumption strategies 

Table [1] compares the two answer-consumption strategies in terms of speed and 
stack spac^ efficiencies. The difference of these two strategies in terms of CPU 

^ The benchmarks are available from probp.com/bench.tar.gz. 
® The total usage of the local, global and trail stacks. 
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tcl: tcKX.Y) :-edge(X,Y) . 

tcKX.Y) :-tcl(X,Z) ,edge(Z,Y) . 

tcr: tcr(X,Y) :-edge(X,Y) . 

tcr(X,Y) :-edge(X,Z) ,tcr(Z,Y) . 

ten: tcn(X,Y) :-edge(X,Y) . 

tcn(X,Y) :-tcn(X,Z) ,tcn(Z,Y) . 

sg: sg(X,X). 

sg(X,Y) :-edge(X,XX) ,sg(XX,YY) ,edge(Y,YY) , 

Fig. 3. Datalog programs. 



Table 1. Comparison of the lazy and eager strategies. 



program 


CPU time 


Staek 


space 


Lazy 


Eager 


Lazy 


Eager 


tel 


1 


1.02 


1 


1.00 


ter 


1 


0.96 


1 


1.00 


ten 


1 


0.90 


1 


1.00 


sg 


1 


0.89 


1 


1.02 


es_o 


1 


1.17 


1 


1.36 


es_r 


1 


1.09 


1 


1.36 


disj 


1 


1.06 


1 


1.41 


gabriel 


1 


1.08 


1 


1.18 


kalah 


1 


1.17 


1 


2.03 


Pg 


1 


2.28 


1 


3.59 


peep 


1 


0.99 


1 


2.88 


read 


1 


0.85 


1 


2.22 


atr 


1 


1.03 


1 


1.06 


average 


1 


1.12 


1 


1.62 



time is small on average. This result implies that for programs with cuts declaring 
the use of the eager strategy would not cause significant slow-down. The difference 
in the usage of stack space is more significant than in CPU time. This is because, 
as discussed before, the lazy strategy has better locality than the eager strategy. 



6.2 Effectiveness of semi-naive optimization 

Table [2] shows the effectiveness of semi-naive optimization in gaining speed-ups 
under both strategies. Without this optimization, the system would consume over 
30% more CPU time on average under either strategy. Our experiment also indicates 
that on average over 95% of the gains in speed are attributed to the early promotion 
technique. 
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Table 2. Effectiveness of semi-naive optimization. 



program 


CPU time 

\ semi ' 


Lazy 


Eager 


tcl 


2.00 


1.89 


tcr 


1.22 


1.19 


ten 


1.68 


1.74 


sg 


1.22 


1.51 


cs_o 


1.10 


1.10 


cs_r 


1.09 


1.10 


disj 


1.52 


1.46 


gabriel 


1.32 


1.15 


kalah 


1.52 


1.41 


Pg 


1.21 


1.05 


peep 


1.09 


1.11 


read 


1.98 


1.27 


atr 


1.00 


1.00 


average 


1.38 


1.31 



6.3 Comparison with XSB 

Table |3] compares BP with XSB on time and stack space efficiencies. For XSB, 
the stack space is the total of the maximum amounts of global, local, trail, choice 
point, and SLG completion stack spaces. The default setting, namely, the SLG- 
WAM and the local scheduling strategy, is used. BP is faster than XSB on the 
Datalog programs and the parser but slower than XSB on the CHAT benchmark 
suite; and BP consumes considerably less stack space than XSB on some of the 
programs {tcr, ten, sg, and atr). 

The results must be interpreted with two differences of the two compared systems 
taken into account: On the one hand, BP is on average more than twice as fast as 
XSB for standard Prolog programs, and on the other hand the trie data structure 
used in XSB (jRamakrishnan et al. 1998)) is far more advanced than hash tables 
used in BP for managing the table area. It is unclear to what extent each difference 
contributes to the overall efficiency. 



The YAP implementation of SLG-WAM is up to twice as fast as XSB ( Somogyi and Sagonas 2006 ) 
on the transitive closure and same-generation benchmarks with both chain and 
cyclic graphs. This entails that the BP implementation of linear tabling is compa- 
rable in speed with the most sophisticated implementation of SLG-WAM for the 
Datalog benchmarks. 

The empirical data on the usage of table space are not reported. BP constantly 
consumes less table space than XSB for the benchmarks. In BP, both subgoal and 
answer tables are maintained as dynamic hashtables. In XSB, in contrast, tables 
are maintained as tries (jRamakrishnan et al. 1998|) . The usage of table space is 
independent of the strategies and optimizations. Both BP and XSB would consume 
the same amount of table space if the same data structure were employed. 
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Table 3. Comparison of B-Prolog and XSB. 



program 


BP 

(Lazy) 


XSB 


CPU time 


Staek space 


tcl 


1 


1.85 


0.81 


tcr 


1 


1.46 


33.41 


ten 


1 


1.31 


32.84 


sg 


1 


1.47 


109.12 


es_o 


1 


0.37 


0.57 


es_r 


1 


0.35 


0.73 


disj 


1 


0.68 


0.82 


gabriel 


1 


0.61 


2.05 


kalah 


1 


1.00 


0.58 


Pg 


1 


0.76 


1.85 


peep 


1 


0.37 


2.97 


read 


1 


0.69 


11.12 


atr 


1 


2.26 


21.24 



6.4 statistics on iterations 

Table |4] reports the statistics on the maximum (max its.) and average (ave. its.) 
numbers of iterations for tabled subgoals to reach their fixpoints0 The column 
#subgoals shows the number of tabled subgoals. While for some programs, the 
maximum number of iterations performed is high (e.g., the maximum number for 
atr is 6), the average numbers are quite low. 

The necessity of re-evaluating looping subgoals has been blamed for the low 
speed of iteration-based tabhng systems (jZhou et al. 2000| |Guo and Gupta 2001D . 
Our new findings indicate that re-evaluation is not a dominant factor for the bench- 
marks. This statistics well explain why an implementation of linear tabling could 
achieve comparable speed performance with SLG-WAM for the benchmarks. 

6.5 The complexity issue 

The following is a slightly changed version of the Warren's example which disenables 
semi-naive optimization: 

: -table p/2. 

p(X,Y) :- q(X,Z) ,c(Z,a,Y) . 
p(X,Y) :- q(X,Z) ,c(Z,b,Y) . 
p(X,X) . 

q(X,Y) :- p(X,Y) . 

Since the last depending subgoals q(X,Z) in p/2 are not tabled, semi-naive opti- 
mization cannot be applied to p/2. For a string (a6)"/^, the query p(0,n) needs 

Each subgoal has a counter which is initialized when the subgoal is tabled and is incremented 
each time the subgoal is resolved using rules. Note that semi-naive optimization may reduce 
the work of each iteration but has no effect on the number of iterations needed to reach the 
fixpoint. 
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Table 4. Statistics on iterations. 



program 


^subgoals 


max its. 


ave. its. 


tcl 


1 


2 


2.00 


tcr 


51 


2 


1.96 


ten 


51 


2 


1.98 


sg 


153 


2 


1.32 


cs_o 


76 


2 


1.14 


cs_r 


76 


2 


1.16 


disj 


74 


2 


1.20 


gabriel 


59 


2 


1.20 


kalah 


102 


3 


1.24 


Pg 


48 


2 


1.13 


peep 


49 


3 


1.29 


read 


131 


5 


1.34 


atr 


7139 


6 


1.81 



n/2 iterations to reach the fixpoint. Since in each iteration the subgoal q(X,Z) 
is rewritten into p(X,Z) which returns all existing answers, the total time taken 
is O(n^). In contrast, the program takes only 0{n) time under SLG. For the size 
n=5000, it took BP 3.5 seconds to run the program while XSB only 15 millisec- 
onds. For the original version of the program to which semi-naive optimization is 
applicable, it took BP only 7 milliseconds. 



7 Related Work 

There are three different tabling schemes, namely OLDT and SLG (jTamaki and Sato 19861 



Sagonas and Swift 1998), CAT (Demoen and Sagonas 1998 Somogyi and Sagonas 2006) 



and iteration-based tabling including linear tabling IjShen et al. 19991 IShen et al. 20011 



IZhou et al. 2 000VZhou and Sa to 2003llZhou et al. 2004^ and DRA (|Guo and Gupta 2001D 



SLG (jChen and Warren 1996,) is a formalization based on OLDT for computing 
well-founded semantics for general programs with negation. The basic idea of us- 
ing iterative deepening to compute fixpoints dates back to the ET* algorithm 
(|Dietrich 1987p . 

In SLG-WAM, a consumer fails after it exhausts all the existing answers and 
its state is preserved by freezing the stack so that it can be reactivated after new 
answers are generated. The CAT approach does not freeze the stack but instead 
copies the stack segments between the consumer and its producer into a separate 
area so that backtracking can be done normally. The saved state is reinstalled after a 



new answer is generated. CHAT (Demoen and Sagonas 19991 is a hybrid approach 



that combines SLG-WAM and CAT. 

Linear tabling relies on iterative computation of looping subgoals to compute 
fixpoints. Linear tabling is probably the easiest scheme to implement since no effort 
is needed to preserve states of consumers and the garbage collector can be kept 
untouched for tabling. Linear tabling is also the most space-efficient scheme since 
no extra space is needed to save states of consumers. Nevertheless, linear tabling 
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without optimization could be computationally more expensive than the other two 
schemes. 

The DRA method ( |Guo and Gupta 2001D is also iteration based, but it identi- 
fies looping clauses dynamically and iterates the execution of looping clauses to 
compute fixpoints. While in linear tabling iteration is performed on only top- most 
looping subgoals, in DRA iteration is performed on every looping subgoal. In ET* 
(jPietrich 1987p . every tabled subgoal is iterated even if it does not occur in a 
loop. Besides the difference in answer consumption strategies and optimizations, 
the linear tabling scheme described in this paper differs from the original version 
(|Zhou et al. 20001 IShen et al. 200ip in that followers fail after they exhaust their 
answers rather than steal their pioneers' choice points. This strategy is originally 
adopted in the DRA method. 

The two consumption strategies have been compared in XSB (jFreire et al. 1998|) 
as two scheduling strategies. The lazy strategy is called local scheduling and the 
eager strategy is called single-stack scheduling. Another strategy, called batched 
scheduling, is similar to local scheduling but top-most looping subgoals do not have 
to wait until their clusters become complete to return answers. Their experimental 
results indicate that local scheduling constantly outperforms the other two strate- 
gies on stack space and can perform asymptotically better than the other two 
strategies on speed. The superior performance of local scheduling is attributed to 
the saving of freezing stack segments. Although our experiment confirms the good 
space performance of the lazy strategy, it gives a counterintuitive result that the 
eager strategy is as fast as the lazy strategy. This result implies that the cost of 
iterative evaluation is considerably smaller than that of freezing stack segments, 
and for predicates with cuts the eager strategy can be used without significant 
slow-down. In our tabling system, different answer consumption strategies can be 
used for different predicates. The tabling system described in (jRocha et al. 2003a| 
also supports mixed strategies. 

Semi-naive optimization is a fundamental idea for reducing redundancy in bottom- 
up evaluation of logic database queries (jBancilhon and Ramakrishnan 1986|IUllman 1988p . 
As far as we know, its impact on top-down evaluation had been unknown before 
(|Zhou et al. 2004p . OLDT (jTamaki and Sato 1986|) and SLG ( jSagonas and Swift 1998[ ) 
do not need this technique since it is not iterative and the underlying delaying 
mechanism successfully avoids the repetition of any derivation step. An attempt 
has been made by Guo and Gupta ( |Guo and Gupta 2001[ ) to make incremental 
consumption of tabled answers possible in DRA. In their scheme, answers are also 
divided into three regions but answers are consumed incrementally as in bottom-up 
evaluation. Since no condition is given for the completeness and no experimental 
result is reported on the impact of the technique, we are unable to give a detailed 
comparison. 

Our semi-naive optimization differs from the bottom-up version in two major 
aspects: Firstly, no differentiated rules are used. In the bottom-up version differen- 
tiated rules are used to ensure that at least one new answer is involved in the join 
of answers for each rule. Consider, for example, the clause: 
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H : -P, Q. 

The following two differentiated rules are used in the evaluation instead of the 
original one: 

H : -AP, Q. 
H : -P, AQ. 

Where AP denotes the new answers produced in the previous round for P. Us- 
ing differentiated rules in top-down evaluation can cause considerable redundancy, 
especially when the body of a clause contains non-tabled subgoals. 

The second major difference between our semi-naive optimization and the bottom- 
up version is that answers in our method are consumed sequentially until exhaus- 
tion, not incrementally as in bottom-up evaluation. A tabled subgoal consumes 
either all available answers or only new answers including answers produced in 
the current round. Neither incremental consumption nor sequential consumption 
seems satisfactory. Incremental consumption avoids redundant joins but may re- 
quire more rounds to reach fixpoints. In contrast, sequential consumption never 
need more rounds to reach fixpoints but may cause redundant joins of answers. 
The early promotion technique alleviates the problem of sequential consumption. 
By promoting answers early from the current region to the previous region, we can 
considerably reduce the redundancy in joins. 

Semi-naive optimization may lower time complexities in bottom-up evaluation 
(|Bancilhon and Ramakrishnan 1986)) . The same result holds to the top-down ver- 
sion as demonstrated by Warren's example. Our experimental results show that 
semi-naive optimization gives an average speed-up of over 30% to linear tabling if 
answers are promoted early, and almost no speed gain if no answer is promoted 
early. In linear tabling, only looping subgoals need to be iteratively evaluated. For 
non-looping subgoals, no re-evaluation is necessary and thus semi-naive optimiza- 
tion has no effect at all on the performance. Most of the looping subgoals in our 
chosen benchmarks reach their fixpoints after 2-3 iterations. In general, more it- 
erations are needed to reach fixpoints in bottom-up evaluation. In addition, in 
bottom-up evaluation, the order of the joins can be optimized and no further joins 
are necessary once a participating set is known to be empty. In contrast, in linear 
tabling joins are done in strictly chronological order. For a conjunction (P,Q,R), 
the join P 1X1 Q is computed even if no answer is available for R. Because of all these 
factors, semi-naive optimization is not as effective in linear tabling as in bottom-up 
evaluation. 

Our semi-naive optimization requires the identification of last depending sub- 
goals. For this purpose, a level mapping is used to represent the call graph of a 
given program. The use of a level mapping to identify optimizable subgoals is anal- 
ogous to the idea used in the stratification-based methods for evaluating logic pro- 
grams ( jApt et al. 1988[ IChen and Warren 1996( |Przymusinski 1989] ). In our level 
mapping, only predicate symbols are considered. It is expected that more accurate 
approximations can be achieved if arguments are considered as well. 

Semi-naive optimization does not solve all the problems of recomputation in 
linear tabling. Recall the Warren's example: 
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: -table p/2. 

p(X,Y) :- p(X,Z) ,c(Z,a,Y) . 
p(X,Y) :- p(X,Z) ,c(Z,b,Y) . 
p(X,X) . 

Assume there is a very costly non-tabled subgoal preceding p (X , Z) , then the subgoal 
has to be executed in each iteration even with semi-naive optimization. This exam- 
ple demonstrates the acuteness of the problem of recomputation because the number 
of iterations needed to reach the fixpoint is not constant. One treatment would be to 
table the subgoal to avoid recomputation, as suggested in ( |Guo and Gupta 2001[ ), 
but tabling extra predicates can cause other problems such as over consumption of 
table space. 

8 Conclusion 

In this paper we have described two answer consumption strategies (namely, lazy 
and eager strategies) and semi-naive optimization for linear tabling. We have com- 
pared the two strategies both qualitatively and quantitatively. Our results indicate 
that, while the lazy strategy has better space efficiency than the eager strategy, the 
eager strategy is comparable in speed with the lazy strategy. This result implies 
that for all-solution search programs the lazy strategy should be adopted and for 
partial-solution search programs including programs with cuts the eager strategy 
should be used. 

We have tailored semi-naive optimization to linear tabling and have given suf- 
ficient conditions for it to be complete. Moreover, we have proposed a technique 
called early answer promotion to reduce redundant consumption of answers. Our ex- 
perimental result indicates that semi-naive optimization gives significant speed-ups 
to some programs. 

Linear tabling has several attractive advantages including its simplicity, ease of 
implementation, and good space efficiency. Early implementations of linear tabling 
were several times slower than XSB. This paper has demonstrated for the first time 
that linear tabling with optimization is as competitive as SLG on time efficiency as 
well for the benchmarks. 

Semi-naive optimization does not solve all the problems of recomputation in linear 
tabling. There are programs for which recomputation can be costly, even leading to 
higher complexities. The future work is to identify the patterns of such programs 
and find methods to deal with them. 
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